AITopics | separability condition

Collaborating Authors

separability condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Discrete Rényi Classifiers

Meisam Razaviyayn, Farzan Farnia, David Tse

Neural Information Processing SystemsOct-9-2025, 14:00:15 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, classifier, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks

Cai, Yuhang, Zhou, Kangjie, Wu, Jingfeng, Mei, Song, Lindsey, Michael, Bartlett, Peter L.

arXiv.org Machine LearningFeb-21-2025

Deep networks often have an enormous amount of parameters an d are theoretically capable of overfitting the training data. However, in practice, deep networks trai ned via gradient descent (GD) or its variants often generalize well. This is commonly attributed to the implicit bias of GD, in which GD finds a certain solution that prevents overfitting ( Zhang et al., 2021; Neyshabur et al., 2017; Bartlett et al., 2021). Understanding the implicit bias of GD is one of the central topics in deep learni ng theory. The implicit bias of GD is relatively well-understood when t he network is homogeneous (see Soudry et al., 2018; Ji and Telgarsky, 2018; Lyu and Li, 2020; Ji and Telgarsky, 2020; Wu et al., 2023, and references therein). For linear networks trained on linearly separabl e data, GD diverges in norm while converging in direction to the maximum margin solution ( Soudry et al., 2018; Ji and Telgarsky, 2018; Wu et al., 2023). Similar results have been established for generic homogene ous networks that include a class of deep networks, assuming that the network at initialization can sepa rate the training data.

assumption 1, ji and telgarsky, nullxnull, (15 more...)

arXiv.org Machine Learning

2502.16075

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 12:30:46 GMT

The paper was interesting to read and contained many nice ideas, intuitions, and theoretical connections. The technical contributions in Sections 3-5 are sound and clever, although in some cases building on previous works [2,4]. Moreover, the problem formulation as a linear optimization and the least squared solution make the approach computationally favorable. However, there are points that need more clarification/elaboration in the paper too, in order to better judge the practical application and impact of the work. For instance, it's never discussed where the set of low order marginal come from.

author feedback and meta-review, contribution, export review, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

Discrete Rényi Classifiers

Neural Information Processing SystemsMar-13-2024, 05:15:50 GMT

When the probability distribution P(X, Y) is known, the optimal classifier, leading to the minimum misclassification rate, is given by the Maximum A-posteriori Probability (MAP) decision rule. However, in practice, estimating the complete joint distribution P(X, Y) is computationally and statistically impossible for large values of d. Therefore, an alternative approach is to first estimate some low order marginals of the joint probability distribution P(X, Y) and then design the classifier based on the estimated low order marginals. This approach is also helpful when the complete training data instances are not available due to privacy concerns. In this work, we consider the problem of finding the optimum classifier based on some estimated low order marginals of (X, Y).

classifier, decision rule, misclassification rate, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning Topic Models: Identifiability and Finite-Sample Analysis

Chen, Yinyin, He, Shishuang, Yang, Yun, Liang, Feng

arXiv.org Machine LearningOct-8-2021

Topic models provide a useful text-mining tool for learning, extracting and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this paper, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets.

algorithm, convex polytope, separability condition, (16 more...)

arXiv.org Machine Learning

2110.04232

Country:

North America > United States > Ohio (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > Iowa (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Voting & Elections (0.92)

Add feedback

Recovering Joint Probability of Discrete Random Variables from Pairwise Marginals

Ibrahim, Shahana, Fu, Xiao

arXiv.org Machine LearningJun-30-2020

Learning the joint probability of random variables (RVs) lies at the heart of statistical signal processing and machine learning. However, direct nonparametric estimation for high-dimensional joint probability is in general impossible, due to the curse of dimensionality. Recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of RVs from three-dimensional marginals, leveraging the algebraic properties of low-rank tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals can still be costly in terms of sample complexity, affecting the performance of this line of work in practice in the sample-starved regime. Using three-dimensional marginals also involves challenging tensor decomposition problems whose tractability is unclear. This work puts forth a new framework for learning the joint PMF using only pairwise marginals, which naturally enjoys a lower sample complexity relative to the third-order ones. A coupled nonnegative matrix factorization (CNMF) framework is developed, and its joint PMF recovery guarantees under various conditions are analyzed. Our method also features a Gram-Schmidt (GS)-like algorithm that exhibits competitive runtime performance. The algorithm is shown to provably recover the joint PMF up to bounded error in finite iterations, under reasonable conditions. It is also shown that a recently proposed economical expectation maximization (EM) algorithm guarantees to improve upon the GS-like algorithm's output, thereby further lifting up the accuracy and efficiency. Real-data experiments are employed to showcase the effectiveness.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2006.16912

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Shallow Neural Network can Perfectly Classify an Object following Separable Probability Distribution

Min, Youngjae, Chung, Hye Won

arXiv.org Machine LearningApr-19-2019

Guiding the design of neural networks is of great importance to save enormous resources consumed on empirical decisions of architectural parameters. This paper constructs shallow sigmoid-type neural networks that achieve 100% accuracy in classification for datasets following a linear separability condition. The separability condition in this work is more relaxed than the widely used linear separability. Moreover, the constructed neural network guarantees perfect classification for any datasets sampled from a separable probability distribution. This generalization capability comes from the saturation of sigmoid function that exploits small margins near the boundaries of intervals formed by the separable probability distribution.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

1904.09109

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multi-User Multi-Armed Bandits for Uncoordinated Spectrum Access

Bande, Meghana, Veeravalli, Venugopal V.

arXiv.org Machine LearningJul-2-2018

The existing spectrum management paradigm treats frequency spectrum as a fixed commodity, which leads to spectrum under utilization. Cognitive radio has emerged as a useful strategy to increase spectrum utilization. The existing literature on cognitive radio has largely been focused on the primary/secondary user paradigm, where secondary users need to detect vacant spectrum when available and vacate the occupied spectrum when a primary user wants to transmit. We focus on a different type of spectrum sharing system in which there is no distinction between users, and in which there is no coordination among the users. The collective performance across all users is more important than that of individual users. This is in contrast to the typical primary/secondary user paradigm in which secondary users bear the responsibility for ensuring priority-based spectrum sharing.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1807.00867

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications

Fu, Xiao, Huang, Kejun, Sidiropoulos, Nicholas D., Ma, Wing-Kin

arXiv.org Machine LearningMar-3-2018

Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems.

artificial intelligence, identifiability, machine learning, (16 more...)

arXiv.org Machine Learning

1803.01257

Country:

North America > United States > Virginia (0.28)
North America > United States > Minnesota (0.28)

Genre:

Instructional Material > Course Syllabus & Notes (0.67)
Research Report (0.63)

Industry:

Health & Medicine (0.67)
Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mixture Proportion Estimation via Kernel Embedding of Distributions

Ramaswamy, Harish G., Scott, Clayton, Tewari, Ambuj

arXiv.org Machine LearningMay-31-2016

Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

data mining, machine learning, mixture proportion estimation, (10 more...)

arXiv.org Machine Learning

1603.02501

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback